15 research outputs found
Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning
Multi-agent systems require effective coordination between groups and
individuals to achieve common goals. However, current multi-agent reinforcement
learning (MARL) methods primarily focus on improving individual policies and do
not adequately address group-level policies, which leads to weak cooperation.
To address this issue, we propose a novel Consensus-oriented Strategy (CoS)
that emphasizes group and individual policies simultaneously. Specifically, CoS
comprises two main components: (a) the vector quantized group consensus module,
which extracts discrete latent embeddings that represent the stable and
discriminative group consensus, and (b) the group consensus-oriented strategy,
which integrates the group policy using a hypernet and the individual policies
using the group consensus, thereby promoting coordination at both the group and
individual levels. Through empirical experiments on cooperative navigation
tasks with both discrete and continuous spaces, as well as Google research
football, we demonstrate that CoS outperforms state-of-the-art MARL algorithms
and achieves better collaboration, thus providing a promising solution for
achieving effective coordination in multi-agent systems
Learning Agent Communication under Limited Bandwidth by Message Pruning
Communication is a crucial factor for the big multi-agent world to stay
organized and productive. Recently, Deep Reinforcement Learning (DRL) has been
applied to learn the communication strategy and the control policy for multiple
agents. However, the practical \emph{\textbf{limited bandwidth}} in multi-agent
communication has been largely ignored by the existing DRL methods.
Specifically, many methods keep sending messages incessantly, which consumes
too much bandwidth. As a result, they are inapplicable to multi-agent systems
with limited bandwidth. To handle this problem, we propose a gating mechanism
to adaptively prune less beneficial messages. We evaluate the gating mechanism
on several tasks. Experiments demonstrate that it can prune a lot of messages
with little impact on performance. In fact, the performance may be greatly
improved by pruning redundant messages. Moreover, the proposed gating mechanism
is applicable to several previous methods, equipping them the ability to
address bandwidth restricted settings.Comment: accepted as a regular paper with poster presentation @ AAAI20. arXiv
admin note: text overlap with arXiv:1903.0556
Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function -- with Real Applications in Traffic Domain
The previous state-of-the-art (SOTA) method achieved a remarkable execution
accuracy on the Spider dataset, which is one of the largest and most diverse
datasets in the Text-to-SQL domain. However, during our reproduction of the
business dataset, we observed a significant drop in performance. We examined
the differences in dataset complexity, as well as the clarity of questions'
intentions, and assessed how those differences could impact the performance of
prompting methods. Subsequently, We develop a more adaptable and more general
prompting method, involving mainly query rewriting and SQL boosting, which
respectively transform vague information into exact and precise information and
enhance the SQL itself by incorporating execution feedback and the query
results from the database content. In order to prevent information gaps, we
include the comments, value types, and value samples for columns as part of the
database description in the prompt. Our experiments with Large Language Models
(LLMs) illustrate the significant performance improvement on the business
dataset and prove the substantial potential of our method. In terms of
execution accuracy on the business dataset, the SOTA method scored 21.05, while
our approach scored 65.79. As a result, our approach achieved a notable
performance improvement even when using a less capable pre-trained language
model. Last but not least, we also explore the Text-to-Python and
Text-to-Function options, and we deeply analyze the pros and cons among them,
offering valuable insights to the community
Neighborhood Cognition Consistent Multi-Agent Reinforcement Learning
Social psychology and real experiences show that cognitive consistency plays
an important role to keep human society in order: if people have a more
consistent cognition about their environments, they are more likely to achieve
better cooperation. Meanwhile, only cognitive consistency within a neighborhood
matters because humans only interact directly with their neighbors. Inspired by
these observations, we take the first step to introduce \emph{neighborhood
cognitive consistency} (NCC) into multi-agent reinforcement learning (MARL).
Our NCC design is quite general and can be easily combined with existing MARL
methods. As examples, we propose neighborhood cognition consistent deep
Q-learning and Actor-Critic to facilitate large-scale multi-agent cooperations.
Extensive experiments on several challenging tasks (i.e., packet routing, wifi
configuration, and Google football player control) justify the superior
performance of our methods compared with state-of-the-art MARL approaches.Comment: Accepted by AAAI2020 with oral presentation
(https://aaai.org/Conferences/AAAI-20/wp-content/uploads/2020/01/AAAI-20-Accepted-Paper-List.pdf).
Since AAAI2020 has started, I have the right to distribute this paper on
arXi
Structural relational inference actor-critic for multi-agent reinforcement learning
Multi-agent reinforcement learning (MARL) is essential for a wide range of high-dimensional scenarios and complicated tasks with multiple agents. Many attempts have been made for agents with prior domain knowledge and predefined structure. However, the interaction relationship between agents in a multi-agent system (MAS) in general is usually unknown, and previous methods could not tackle dynamical activities in an ever-changing environment. Here we propose a multi-agent Actor-Critic algorithm called Structural Relational Inference Actor-Critic (SRI-AC), which is based on the framework of centralized training and decentralized execution. SRI-AC utilizes the latent codes in variational autoencoder (VAE) to represent interactions between paired agents, and the reconstruction error is based on Graph Neural Network (GNN). With this framework, we test whether the reinforcement learning learners could form an interpretable structure while achieving better performance in both cooperative and competitive scenarios. The results indicate that SRI-AC could be applied to complex dynamic environments to find an interpretable structure while obtaining better performance compared to baseline algorithms
Stackelberg Decision Transformer for Asynchronous Action Coordination in Multi-Agent Systems
Asynchronous action coordination presents a pervasive challenge in
Multi-Agent Systems (MAS), which can be represented as a Stackelberg game (SG).
However, the scalability of existing Multi-Agent Reinforcement Learning (MARL)
methods based on SG is severely constrained by network structures or
environmental limitations. To address this issue, we propose the Stackelberg
Decision Transformer (STEER), a heuristic approach that resolves the
difficulties of hierarchical coordination among agents. STEER efficiently
manages decision-making processes in both spatial and temporal contexts by
incorporating the hierarchical decision structure of SG, the modeling
capability of autoregressive sequence models, and the exploratory learning
methodology of MARL. Our research contributes to the development of an
effective and adaptable asynchronous action coordination method that can be
widely applied to various task types and environmental configurations in MAS.
Experimental results demonstrate that our method can converge to Stackelberg
equilibrium solutions and outperforms other existing methods in complex
scenarios.Comment: 11pages, 7paper